可见的红外人员重新识别(VI-REID)是与可见和红外形态相同的个人匹配的任务。它的主要挑战在于由在不同光谱上运行的相机引起的模态差距。现有的VI-Reid方法主要集中于跨模式学习的一般特征,通常是以特征可区分性为代价。为了解决这个问题,我们提出了一个基于周期的新型网络,用于中性但歧视性特征学习,称为环形。具体而言,Cycletrans使用轻巧的知识捕获模块(KCM)根据伪查询从与模态相关的特征地图捕获丰富的语义。之后,根据模态 - 欧罗威兰原型将这些特征转换为中性特征,将差异建模模块(DMM)部署为中性。为了确保特征可区分性,进一步部署了另外两个KCMs以进行特征周期结构。通过自行车结构,我们的方法可以在保留其出色的语义的同时学习有效的中性特征。在SYSU-MM01和REGDB数据集上进行的广泛实验验证了环形验证的优点针对最先进的方法,在SYSU-MM01中排名1的 +4.57%,REGDB中排名1 +2.2%。
translated by 谷歌翻译
尽管近年来人的重新识别取得了令人印象深刻的改善,但在实际应用程序场景中,由不同的障碍引起的常见闭塞案例仍然是一个不稳定的问题。现有方法主要通过采用额外网络提供的身体线索来区分可见部分,以解决此问题。然而,助理模型和REID数据集之间的不可避免的域间隙极大地增加了获得有效和有效模型的困难。为了摆脱额外的预训练网络并在端到端可训练网络中实现自动对齐,我们根据两个不言而喻的先验知识提出了一种新型的动态原型掩码(DPM)。具体而言,我们首先设计了一个层次蒙版生成器,该层面生成器利用层次的语义选择高质量的整体原型和闭塞输入图像的特征表示之间的可见图案空间。在这种情况下,可以自发地在选定的子空间中很好地对齐。然后,为了丰富高质量整体原型的特征表示并提供更完整的特征空间,我们引入了一个头部丰富模块,以鼓励不同的头部在整个图像中汇总不同的模式表示。对被遮挡和整体人员重新识别基准进行的广泛的实验评估证明了DPM优于最先进的方法。该代码在https://github.com/stone96123/dpm上发布。
translated by 谷歌翻译
When using LiDAR semantic segmentation models for safety-critical applications such as autonomous driving, it is essential to understand and improve their robustness with respect to a large range of LiDAR corruptions. In this paper, we aim to comprehensively analyze the robustness of LiDAR semantic segmentation models under various corruptions. To rigorously evaluate the robustness and generalizability of current approaches, we propose a new benchmark called SemanticKITTI-C, which features 16 out-of-domain LiDAR corruptions in three groups, namely adverse weather, measurement noise and cross-device discrepancy. Then, we systematically investigate 11 LiDAR semantic segmentation models, especially spanning different input representations (e.g., point clouds, voxels, projected images, and etc.), network architectures and training schemes. Through this study, we obtain two insights: 1) We find out that the input representation plays a crucial role in robustness. Specifically, under specific corruptions, different representations perform variously. 2) Although state-of-the-art methods on LiDAR semantic segmentation achieve promising results on clean data, they are less robust when dealing with noisy data. Finally, based on the above observations, we design a robust LiDAR segmentation model (RLSeg) which greatly boosts the robustness with simple but effective modifications. It is promising that our benchmark, comprehensive analysis, and observations can boost future research in robust LiDAR semantic segmentation for safety-critical applications.
translated by 谷歌翻译
In recent years, arbitrary image style transfer has attracted more and more attention. Given a pair of content and style images, a stylized one is hoped that retains the content from the former while catching style patterns from the latter. However, it is difficult to simultaneously keep well the trade-off between the content details and the style features. To stylize the image with sufficient style patterns, the content details may be damaged and sometimes the objects of images can not be distinguished clearly. For this reason, we present a new transformer-based method named STT for image style transfer and an edge loss which can enhance the content details apparently to avoid generating blurred results for excessive rendering on style features. Qualitative and quantitative experiments demonstrate that STT achieves comparable performance to state-of-the-art image style transfer methods while alleviating the content leak problem.
translated by 谷歌翻译
With the increasing ability of large language models (LLMs), in-context learning (ICL) has become a new paradigm for natural language processing (NLP), where LLMs make predictions only based on contexts augmented with a few training examples. It has been a new trend exploring ICL to evaluate and extrapolate the ability of LLMs. In this paper, we aim to survey and summarize the progress, challenges, and future work in ICL. We first present a formal definition of ICL and clarify its correlation to related studies. Then, we organize and discuss advanced techniques of ICL, including training strategies, prompting strategies, and so on. Finally, we present the challenges of ICL and provide potential directions for further research. We hope our work can encourage more research on uncovering how ICL works and improving ICL in future work.
translated by 谷歌翻译
Gaze estimation is the fundamental basis for many visual tasks. Yet, the high cost of acquiring gaze datasets with 3D annotations hinders the optimization and application of gaze estimation models. In this work, we propose a novel Head-Eye redirection parametric model based on Neural Radiance Field, which allows dense gaze data generation with view consistency and accurate gaze direction. Moreover, our head-eye redirection parametric model can decouple the face and eyes for separate neural rendering, so it can achieve the purpose of separately controlling the attributes of the face, identity, illumination, and eye gaze direction. Thus diverse 3D-aware gaze datasets could be obtained by manipulating the latent code belonging to different face attributions in an unsupervised manner. Extensive experiments on several benchmarks demonstrate the effectiveness of our method in domain generalization and domain adaptation for gaze estimation tasks.
translated by 谷歌翻译
Generalizability to unseen forgery types is crucial for face forgery detectors. Recent works have made significant progress in terms of generalization by synthetic forgery data augmentation. In this work, we explore another path for improving the generalization. Our goal is to reduce the features that are easy to learn in the training phase, so as to reduce the risk of overfitting on specific forgery types. Specifically, in our method, a teacher network takes as input the face images and generates an attention map of the deep features by a diverse multihead attention ViT. The attention map is used to guide a student network to focus on the low-attended features by reducing the highly-attended deep features. A deep feature mixup strategy is also proposed to synthesize forgeries in the feature domain. Experiments demonstrate that, without data augmentation, our method is able to achieve promising performances on unseen forgeries and highly compressed data.
translated by 谷歌翻译
The development of deep learning models in medical image analysis is majorly limited by the lack of large-sized and well-annotated datasets. Unsupervised learning does not require labels and is more suitable for solving medical image analysis problems. However, most of the current unsupervised learning methods need to be applied to large datasets. To make unsupervised learning applicable to small datasets, we proposed Swin MAE, which is a masked autoencoder with Swin Transformer as its backbone. Even on a dataset of only a few thousand medical images and without using any pre-trained models, Swin MAE is still able to learn useful semantic features purely from images. It can equal or even slightly outperform the supervised model obtained by Swin Transformer trained on ImageNet in terms of the transfer learning results of downstream tasks. The code will be publicly available soon.
translated by 谷歌翻译
Remote sensing of the Earth's surface water is critical in a wide range of environmental studies, from evaluating the societal impacts of seasonal droughts and floods to the large-scale implications of climate change. Consequently, a large literature exists on the classification of water from satellite imagery. Yet, previous methods have been limited by 1) the spatial resolution of public satellite imagery, 2) classification schemes that operate at the pixel level, and 3) the need for multiple spectral bands. We advance the state-of-the-art by 1) using commercial imagery with panchromatic and multispectral resolutions of 30 cm and 1.2 m, respectively, 2) developing multiple fully convolutional neural networks (FCN) that can learn the morphological features of water bodies in addition to their spectral properties, and 3) FCN that can classify water even from panchromatic imagery. This study focuses on rivers in the Arctic, using images from the Quickbird, WorldView, and GeoEye satellites. Because no training data are available at such high resolutions, we construct those manually. First, we use the RGB, and NIR bands of the 8-band multispectral sensors. Those trained models all achieve excellent precision and recall over 90% on validation data, aided by on-the-fly preprocessing of the training data specific to satellite imagery. In a novel approach, we then use results from the multispectral model to generate training data for FCN that only require panchromatic imagery, of which considerably more is available. Despite the smaller feature space, these models still achieve a precision and recall of over 85%. We provide our open-source codes and trained model parameters to the remote sensing community, which paves the way to a wide range of environmental hydrology applications at vastly superior accuracies and 2 orders of magnitude higher spatial resolution than previously possible.
translated by 谷歌翻译
We study the composition style in deep image matting, a notion that characterizes a data generation flow on how to exploit limited foregrounds and random backgrounds to form a training dataset. Prior art executes this flow in a completely random manner by simply going through the foreground pool or by optionally combining two foregrounds before foreground-background composition. In this work, we first show that naive foreground combination can be problematic and therefore derive an alternative formulation to reasonably combine foregrounds. Our second contribution is an observation that matting performance can benefit from a certain occurrence frequency of combined foregrounds and their associated source foregrounds during training. Inspired by this, we introduce a novel composition style that binds the source and combined foregrounds in a definite triplet. In addition, we also find that different orders of foreground combination lead to different foreground patterns, which further inspires a quadruplet-based composition style. Results under controlled experiments on four matting baselines show that our composition styles outperform existing ones and invite consistent performance improvement on both composited and real-world datasets. Code is available at: https://github.com/coconuthust/composition_styles
translated by 谷歌翻译